Goto

Collaborating Authors

 context sequence


Decoupled Context Processing for Context Augmented Language Modeling Zonglin Li

Neural Information Processing Systems

Language models can be augmented with a context retriever to incorporate knowledge from large external databases. By leveraging retrieved context, the neural network does not have to memorize the massive amount of world knowledge within its internal parameters, leading to better parameter efficiency, interpretability and mod-ularity.





Enhanced Transformer architecture for in-context learning of dynamical systems

Rufolo, Matteo, Piga, Dario, Maroni, Gabriele, Forgione, Marco

arXiv.org Artificial Intelligence

Recently introduced by some of the authors, the in-context identification paradigm aims at estimating, offline and based on synthetic data, a meta-model that describes the behavior of a whole class of systems. Once trained, this meta-model is fed with an observed input/output sequence (context) generated by a real system to predict its behavior in a zero-shot learning fashion. In this paper, we enhance the original meta-modeling framework through three key innovations: by formulating the learning task within a probabilistic framework; by managing non-contiguous context and query windows; and by adopting recurrent patching to effectively handle long context sequences. The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class, highlighting the model's enhanced performance and scalability.


DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning

Lu, Keer, Liang, Zheng, Nie, Xiaonan, Pan, Da, Zhang, Shusen, Zhao, Keshi, Chen, Weipeng, Zhou, Zenan, Dong, Guosheng, Zhang, Wentao, Cui, Bin

arXiv.org Artificial Intelligence

The effectiveness of long-context modeling is important for Large Language Models (LLMs) in various applications. Despite their potential, LLMs' efficacy in processing long context does not consistently meet expectations, posing significant challenges for efficient management of prolonged sequences in training. This difficulty is compounded by the scarcity of comprehensive and diverse training datasets suitable for long sequences, which stems from inherent length biases across different data sources, and the logistical complexities associated with massive data management for training in extended contexts. In this work, we introduce DataSculpt, a data construction framework designed to strategically augment the data architecture for extended-context training. Our thorough evaluations demonstrate DataSculpt's remarkable capacity to boost long-context training performance, achieving improvements including an 18.09% increase in retrieval augmentation, 21.23% in summarization, 21.27% in reading comprehension, and a 3.81% rise in code completion, all while preserving the models' overall proficiency with a 4.88% improvement.


Stochastic contextual bandits with graph feedback: from independence number to MAS number

Wen, Yuxiao, Han, Yanjun, Zhou, Zhengyuan

arXiv.org Artificial Intelligence

We consider contextual bandits with graph feedback, a class of interactive learning problems with richer structures than vanilla contextual bandits, where taking an action reveals the rewards for all neighboring actions in the feedback graph under all contexts. Unlike the multi-armed bandits setting where a growing literature has painted a near-complete understanding of graph feedback, much remains unexplored in the contextual bandits counterpart. In this paper, we make inroads into this inquiry by establishing a regret lower bound $\Omega(\sqrt{\beta_M(G) T})$, where $M$ is the number of contexts, $G$ is the feedback graph, and $\beta_M(G)$ is our proposed graph-theoretical quantity that characterizes the fundamental learning limit for this class of problems. Interestingly, $\beta_M(G)$ interpolates between $\alpha(G)$ (the independence number of the graph) and $\mathsf{m}(G)$ (the maximum acyclic subgraph (MAS) number of the graph) as the number of contexts $M$ varies. We also provide algorithms that achieve near-optimal regrets for important classes of context sequences and/or feedback graphs, such as transitively closed graphs that find applications in auctions and inventory control. In particular, with many contexts, our results show that the MAS number completely characterizes the statistical complexity for contextual bandits, as opposed to the independence number in multi-armed bandits.


Towards Generalizable Reinforcement Learning for Trade Execution

Zhang, Chuheng, Duan, Yitong, Chen, Xiaoyu, Chen, Jianyu, Li, Jian, Zhao, Li

arXiv.org Artificial Intelligence

Optimized trade execution is to sell (or buy) a given amount of assets in a given time with the lowest possible trading cost. Recently, reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data. However, we find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment. In this paper, we provide an extensive study on the overfitting problem in optimized trade execution. First, we model the optimized trade execution as offline RL with dynamic context (ORDC), where the context represents market variables that cannot be influenced by the trading policy and are collected in an offline manner. Under this framework, we derive the generalization bound and find that the overfitting issue is caused by large context space and limited context samples in the offline setting. Accordingly, we propose to learn compact representations for context to address the overfitting problem, either by leveraging prior knowledge or in an end-to-end manner. To evaluate our algorithms, we also implement a carefully designed simulator based on historical limit order book (LOB) data to provide a high-fidelity benchmark for different algorithms. Our experiments on the high-fidelity simulator demonstrate that our algorithms can effectively alleviate overfitting and achieve better performance.


Decoupled Context Processing for Context Augmented Language Modeling

Li, Zonglin, Guo, Ruiqi, Kumar, Sanjiv

arXiv.org Artificial Intelligence

Language models can be augmented with a context retriever to incorporate knowledge from large external databases. By leveraging retrieved context, the neural network does not have to memorize the massive amount of world knowledge within its internal parameters, leading to better parameter efficiency, interpretability and modularity. In this paper we examined a simple yet effective architecture for incorporating external context into language models based on decoupled Encoder-Decoder architecture. We showed that such a simple architecture achieves competitive results on auto-regressive language modeling and open domain question answering tasks. We also analyzed the behavior of the proposed model which performs grounded context transfer. Finally we discussed the computational implications of such retrieval augmented models.


Formal Algorithms for Transformers

Phuong, Mary, Hutter, Marcus

arXiv.org Artificial Intelligence

It covers what Transformers are (Section 3 Transformers and Typical Tasks 3 6), how they are trained (Section 7), what 4 Tokenization: How Text is Represented 4 they're used for (Section 3), their key architectural 5 Architectural Components 4 components (Section 5), tokenization (Section 6 Transformer Architectures 7 4), and a preview of practical considerations 7 Transformer Training and Inference 8 8 Practical Considerations 9 (Section 8) and the most prominent models.